Spoken Portuguese: Geographic and Social Varieties
نویسندگان
چکیده
The Spoken Portuguese: Geographic and Social Varieties project has as its main goal the Portuguese teaching as foreign language. The idea is to provide a collection of authentic spoken texts and to make it friendly usable. Therefore, a selection of spontaneous oral data was made, using either already compiled material or material recorded for this purpose. The final corpus constitution resulted in a representative sample that includes European, Brazilian and African Portuguese, as well as Macau and East-Timor Portuguese. In order to accomplish a functional product the Linguistics Center of Lisbon University developed a sound/text alignment software. The final result is a CD-ROM collection that contains 83 text files, 83 sound files and 83 files produced by the sound/text alignment tool. This independence between sound and text files allows the CD-ROM user to manipulate it for other purposes than the educational one.
منابع مشابه
A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese
One of the major obstacles in deploying spoken language technologies (SLTs) in the developing world is a lack of key linguistic resources – e.g. electronic dictionaries, phonetically aligned corpora, pronunciation lexicons, etc. – that describe the non-dominant varieties spoken in such countries and regions. In this paper, we describe the work of the LUPo (Portuguese Unisyn Lexicon) project to ...
متن کاملAligning and recognizing spoken books in different varieties of Portuguese
This paper tries to present digital spoken books as a useful diagnostic tool for detecting alignment and recognition problems and for studying the porting of these technologies to different varieties of the same language Portuguese, in our case. We summarize the main differences between European and Brazilian Portuguese (EP/BP) and describe how they affect the GtoP system. Despite the small siz...
متن کاملThe African Varieties of Portuguese: Compiling Comparable Corpora and Analyzing Data-Derived Lexicon
“Linguistic Resources for the Study of the Portuguese African Varieties” is an ongoing project that aims at the constitution, treatment, analysis and availability of a corpus of the African varieties of Portuguese, with 3 million words of written and spoken texts, constituted by five comparable subcorpora, corresponding to the varieties of Angola, Cape Verde, Guinea-Bissau, Mozambique and Sao T...
متن کاملExploiting variety-dependent Phones in Portuguese Variety Identification
This paper presents a new approach of building a language identification system using a specialized Phone Recognition system followed by Language Modeling (PRLM) to differentiate Portuguese varieties spoken in African Countries from European Portuguese. The system is designed to focus on exploiting the phonotactic information of a single discriminatively trained tokenizer for the specific pair ...
متن کاملLanguage and variety verification on broadcast news for Portuguese
This paper describes a language/accent verification system for Portuguese, that explores different type of properties: acoustic, phonotactic and prosodic. The two-stage system is designed to be used as a pre-processing module for the Portuguese Automatic Speech Recognition (ASR) system developed at INESC-ID. As the ASR system is applied everyday to transcribe the evening news from a Portuguese ...
متن کامل